Exploring the Imported Data

Md Zulquar Nain

Importing the Data File

# importing data from `csv` file
datai <- read.csv("hsbraw.csv")
  • datai - name of the imported data file inR

  • hsbraw.csv name of the csv file being imported

Exploring the Dataset I

  • Class, structure and dimension of the dataset
# Structure of the data
str(datai)
'data.frame':   189 obs. of  9 variables:
 $ id     : int  3 4 5 6 7 8 9 10 11 12 ...
 $ gender : chr  "male" "female" "male" "female" ...
 $ schtyp : chr  "public" "public" "public" "public" ...
 $ prog   : chr  "academic" "academic" "academic" "academic" ...
 $ read   : int  63 44 47 47 57 39 48 47 34 37 ...
 $ write  : int  65 50 40 41 54 44 49 54 46 44 ...
 $ math   : int  48 41 43 46 59 52 52 49 45 45 ...
 $ science: int  63 39 45 40 47 44 -99 53 39 39 ...
 $ socst  : int  56 51 31 41 51 48 -99 61 36 46 ...
#Class of the data
class(datai)
[1] "data.frame"
# Dimension of the data
dim(datai)
[1] 189   9

Exploring the Dataset II

  • First n rows of observations of the data set
    • head(data.frame name, n)
  • Last n rows of observations of the data set
    • tail(data.frame name, n)
# View top two rows of the data
head(datai,2)
  id gender schtyp     prog read write math science socst
1  3   male public academic   63    65   48      63    56
2  4 female public academic   44    50   41      39    51
# View bottom two rows 
tail(datai,2)
     id gender  schtyp     prog read write math science socst
188 199   male private academic   52    59   50      61    61
189 200   male private academic   68    54   75      66    66

Creating New Variables

  • Adding TWO variables & Creating a new ONE
  • Creating a new variable by taking average of two variables
ssum <- datai$read+datai$write
savg <- (datai$read+datai$write)/2
  • alternate method
attach(datai)
ssum1 <- read+write
savg1 <- (read+write)/2
detach(datai)

Creating New Variables

  • Another method
dataim<- transform(datai,sum2=read+write,smean=(read+write)/2)
head(datai, 2)
  id gender schtyp     prog read write math science socst
1  3   male public academic   63    65   48      63    56
2  4 female public academic   44    50   41      39    51
head(dataim,2)
  id gender schtyp     prog read write math science socst sum2 smean
1  3   male public academic   63    65   48      63    56  128    64
2  4 female public academic   44    50   41      39    51   94    47

Sorting/Ranking/Ordering

  • Use order()

  • sort by a variable

max(dataim$write)
[1] 67
min(dataim$write)
[1] 31
head(dataim,4)
  id gender schtyp     prog read write math science socst sum2 smean
1  3   male public academic   63    65   48      63    56  128  64.0
2  4 female public academic   44    50   41      39    51   94  47.0
3  5   male public academic   47    40   43      45    31   87  43.5
4  6 female public academic   47    41   46      40    41   88  44.0
datais <- dataim[order(dataim$write),]
head(datais,4)
     id gender schtyp     prog read write math science socst sum2 smean
119 126   male public  general   42    31   57      47    51   73  36.5
143 153   male public vocation   39    31   40      39    51   70  35.0
15   18   male public vocation   50    33   49      44    36   83  41.5
81   86   male public  general   44    33   54      58    31   77  38.5
  • sort by descending order
dataisd <- dataim[order(-dataim$write),]
head(dataisd,4)
   id gender schtyp     prog read write math science socst sum2 smean
29 32 female public vocation   50    67   66      66    56  117  58.5
36 39 female public academic   66    67   67      61    66  133  66.5
54 59 female public academic   65    67   63      55    71  132  66.0
63 68   male public academic   73    67   71      63    66  140  70.0

Reshaping

  • Transpose
dataA <- data.frame(id=c(1,1,2,2),
                    time=c(1,2,1,2),
                    x1=c(5,2,4,6))
dim(dataA)
[1] 4 3
trdataA <- t(dataA)
dim(trdataA)
[1] 3 4

THANKS